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Summed Score Likelihood Based Indices for Testing Latent Variable 


Distribution Fit in Item Response Theory 


Abstract 

In standard item response theory (IRT) applications, the latent variable is typically assumed to 
be normally distributed. If the normality assumption is violated, the item parameter estimates can 
become biased. Summed score likelihood based statistics may be useful for testing latent variable 
distribution fit. We develop Satorra-Bentler type (Satorra & Bentler, 1994) moment adjustments to 
approximate the test statistics’ tail-area probability. A simulation study was conducted to examine the 
calibration and power of the unadjusted and adjusted statistics in various simulation conditions. 
Results show that the proposed indices have tail-area probabilities that can be closely approximated by 
central chi-squared random variables under the null hypothesis. Furthermore, the test statistics are 
focused. They are powerful for detecting latent variable distributional assumption violations, and not 
sensitive (correctly) to other forms of model misspecification such as multidimensionality. As a 
comparison, the goodness-of-fit statistic Mz (Maydeu-Olivares & Joe, 2005) has considerably lower 
power against latent variable non-normality than the proposed indices. Empirical data from a 


patient-reported health outcomes study is used as illustration. 
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Introduction 

Item response theory (IRT) provides powerful methods supporting educational and 
psychological measurement (Thissen & Steinberg, 2009). The latent variable in IRT models is usually 
assumed to follow a normal distribution for the purpose of item parameter estimation (Bock & 
Lieberman, 1970; Bock & Aitkin, 1981). However, this assumption might be violated in some situations 
(Woods, 2006; Woods & Lin, 2009). Woods (2006) described several potential situations where 6 may 
be nonnormal. For example, as severe symptoms of psychological disorders rarely exist in the general 
population and most people have low levels of psychopathological symptoms, latent variables 
reflecting these symptoms may be positively skewed. Another possible cause arises in the situation 
when the population is heterogeneous. For instance, when two or more subpopulations with different 
means and variances are grouped together, potentially multimodal population distributions may be the 
result. Calibrating the items with respect to the combined population renders the normality 
assumption suspect. When the assumption of normal latent variable distribution is violated, the item 
parameter estimates might be biased, leading to bias in subsequent inferences based on these item 
parameter estimates. Take Computer Adaptive Testing (CAT) as an example, the item parameter 
estimates are utilized for both item selection and test scoring. Thus, bias in the estimation of item 
parameters might result in significant bias in the reported test scores. 

Although alternative approaches exist for estimating the latent variable distribution in standard 
IRT models (Bock & Aitkin, 1981; Woods & Thissen, 2006; Woods & Lin, 2009), these approaches are 


computationally more demanding and specialized software is necessary. For example, in our 
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experience, the empirical histogram representation of the latent prior distribution is often less stable 
numerically than the standard normal prior. Thus, it is worthwhile to test the assumption of latent 
variable normality before more “expensive” approaches are applied. Summed score likelihood based 
statistics may be useful for testing latent variable distribution fit. One problem is that the statistics do 
not asymptotically follow a chi-squared distribution. We propose a Satorra-Bentler type moment 
adjustment method (Satorra & Bentler, 1994) in this paper. The statistics’ tail-area probability can be 
approximated by making use of the item parameter error covariance matrix and a Jacobian. The 
properties of the adjusted and unadjusted statistics are examined by simulation and empirical studies. 
Additionally, a modified Lord-Wingersky algorithm for computing the Jacobian matrix is presented in 
the Appendix. 
Item Response Theory Models 

In standard IRT models, the conditional item response probabilities (also referred to as item 
tracelines or item characteristic curves) are represented as a function of latent variable 6 and item 
parameters. For example, the 3-parameter logistic (3PL) model can be written as: 


1-9; 
T,(1|9) = gi 


it 1 + exp[—(c; + a,6)]’ (1) 


where T;(1|9) represents item i’s traceline for the 1 category (indicating correct/endorsement response 
in most contexts) as a function of 6. The item parameters include: g;, which is the pseudo-guessing 
probability for the item (the lower asymptote parameter); a;, which is the slope (the discrimination 
parameter), and c;, which is the item intercept parameter. The classical difficulty (threshold) parameter 


is obtained as —c;/a;. If g; is zero, the model reduces to a 2-parameter logistic (2PL) model, and if all 
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the item slopes are constrained to be equal to a common slope (a; = a), the 1-parameter logistic (1PL) 
model is the result. The incorrect/non-endorsement response probability is equal to T;(0|6) = 1 — 
T; (18). 

For an item with kK; ordered polytomous responses, the graded response model is often 
utilized. Let the response categories be coded as k = 0,...K; — 1. The cumulative response probability 


for item i in categories k and above is 


1 


T(k\6):-=—————_—— > — > = 
a se CR) 


(2) 
for k = 1,...K; — 1. Having defined the boundary cases T;*(0|@) = 1 and T;*(K;|@) = 0, the category 
response probabilities can be written as 

Ti(k|0) = T;* (k10) — T;*(k + 118), (3) 
for k = 0,...K; — 1. Let U; be a random variable whose realization u; is a response to item i. 


Regardless of the number of categories, the probability mass function of U;, conditional on 6, is that of 


a multinomial with trial size 1: 


Kj-1 


PU; = u;|@) = | [rrqtarr«, 


k=0 


(4) 


where 1;(u;) is an indicator function such that 


= 1, if k= Uj 
1p Cu) = i otherwise’ 


(5) 
The Latent Variable Distribution in IRT 


Estimating the latent variable distribution along with item parameters using the empirical 


histogram (Bock & Aitkin, 1981; Mislevy, 1984; Zimowski, Muraki, Mislevy, & Bock, 1996) is an 
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established strategy for detecting and correcting latent variable nonnormality in IRT. Newer 
semi-parametric density estimation procedures offer more efficient alternatives. These include the 
Ramsay-Curve IRT (Woods & Thissen, 2006), and Davidian Curve IRT (Woods & Lin, 2009; Monroe & 
Cai, 2014), as well as its multidimensional extension (Monroe, 2014). In practice, however, estimating 
latent variable densities often requires specialized software. More complex latent variable distributions 
also involve more parameters to be estimated from the data, increasing the need for larger calibration 
sample sizes to achieve stable estimation. Finally, even as nonnormal latent densities may be modeled, 
e.g., using a Ramsay curve IRT model (Woods & Thissen, 2006), and the relative model fit may be 
evaluated against a baseline using likelihood ratio tests, it does not circumvent the need for absolute 
goodness of fit indices to establish the adequacy of the least restrictive model in the class of models 
being compared (see Maydeu-Olivares & Cai, 2006 for further explanation). It would be highly 
desirable to establish a set of statistics that can be used to diagnose the extent to which a normal (or 
non-normal) latent variable distribution may in fact be a reasonable characterization before more 
“expensive” methods and software programs for semi-parametric density estimation are employed. 

In developing such a group of test statistics for latent variable distribution fit, several desiderata 
should be taken into account. First, the statistics should be easily computable, preferably using only 
standard byproducts of the item calibration process. Second, the statistics should have well-grounded 
heuristic motivation and theoretical justification. Third, the frequency calibration of the statistics under 
the null hypothesis should be sufficiently accurate. Finally, the statistics should have adequate power 


that is focused on latent variable distribution assumption violation and sufficient diagnostic specificity, 
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rather than becoming a surrogate of overall model fit tests. 

The guiding insight has been provided elsewhere in the literature. For unidimensional IRT 
modeling, the observed and model-implied summed score distribution can be a basis for inferring the 
adequacy of the latent variable distribution specification in the IRT model (Thissen & Wainer, 2001; p. 
130). After model fitting, residual summed score probabilities may be used to construct chi-squared test 
statistics. While the idea itself is not new (see Ferrando & Lorenzo-seva, 2001; Hambleton & Traub, 
1973; Lord, 1953; Ross, 1966; Sinharay, Johnson, & Stern, 2006, among others), we utilize the recently 
developed theory of limited-information goodness-of-fit testing to formally demonstrate that the 
summed score likelihood based fit index proposed here belongs to the general family of multinomial 
limited-information tests. 

The Multinomial Sampling Model and Maximum Likelihood Estimation 
Let there be J items in a test. Under the conditional independence assumption, the IRT model 


specifies the conditional response pattern probability as the following product 


I 
(6) 
P(ni- U; = uj |) = P(U; = u;|@). 
eae 


Assuming that g(@) is the distribution of the latent variable (also known as the prior distribution), the 


marginal response pattern probability is the following integral: 


I I 
(1) U; = «) = | | [pw = u;|@) g(0)dé = m,,(y), ”) 
i=1 i=1 


where u = (uj,...,U;)’ is the response pattern, and y isa d xX 1 vector that collects together the free 


item parameters from all J items. The parenthetical notation 7,,(y) in Equation (7) is used to 
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emphasize the fact that it is the model. The marginal response probability depends on the item 
parameters, the item-level response models, and the assumed latent variable distribution. 

Recall that K; is the number of categories for item i. For I items, the IRT model generates a 
total of C = []/-, K; cross-classifications or possible item response patterns in the form of a 
contingency table. Based on a sample of N respondents, let the observed proportion associated with 
pattern u be denoted as p,. The sampling model for this contingency table is a multinomial 
distribution with C cells and N trials. The multinomial log-likelihood for the item parameters y is 
proportional to 

logL(y) & NY pylogmu(y), 8) 
u 
where the summation is over all C response patterns. Maximization of the log-likelihood (e.g., with 
the EM algorithm; Bock & Aitkin, 1981) leads to the maximum marginal likelihood estimator /. 

Upon finding ¥, the IRT model generates model-implied probabilities for each response 
pattern 7,,(7) = 7,. Suppose the model-implied response pattern probabilities 7, are collected into a 
C x1 vector 7 of all model-implied response pattern probabilities. By analogy, leta C x 1 vector m 
contain the true (population) response pattern probabilities. Similarly, the observed proportions py 
can be collected into a C x 1 vector p. For example, for 3 dichotomously scored items there are 2 = 8 


item response patterns, and the response pattern probabilities and observed proportions are: 
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nw TU, aS 
T ooo 000 oo0(¥) Pooo (9) 
To01 Too1 Too1(Y) Poo1 
910 010 To10(¥) Po10 
To11 ~ — | foi | _ | Tou _ | Poi 

m=| 7 : T=]. = ae My p= D 

a 1400 T100(Y) bs 
iy Gc Ges 

x we 11 
441 to T110(Y) Pi11 

M41 T111(Y) 


From results in discrete multivariate analysis (e.g., Bishop, Fienberg, & Holland, 1975), 7 is 
consistent, asymptotically normal, and asymptotically efficient, which can be summarized as follows: 
VNG -¥) > Na(O,F-, (10) 
where F = A’[diag(m)]~'A isthe d xd Fisher information matrix, with the Jacobian matrix A 
defined as the C x d matrix of all first-order partial derivatives of the response patterns probabilities 
with respect to the item parameters: 
0 
A= an (11) 
Distribution of Residuals under Maximum Likelihood Estimation 
Based on Equation (10), it can be shown that the asymptotic distribution of the difference (p — 
m1) is C-variate normal: 

VN(p — 2) > Ne(0,8), (12) 
where = = diag(m) — mm’ is the covariance matrix associated with the multinomial. The residual 
vector (p — 7) is asymptotically C-variate normal under maximum likelihood estimation: 

VN(p — it) 2 N- (0,1), (13) 


where IT = E — AF~'A’, and the second term reflects variability due to estimation of item parameters. 
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Lower-order Marginal Probabilities 

The IRT model implies marginal probabilities. Consider the 3-item example from above. There 
are 3 first order marginal probabilities 7; (i = 1,..3), one per item. There are also 3 second order 
marginal probabilities 7;; for the unique item pairs ( 1 <j <i < 3). In general, these probabilities 
correspond to the J univariate and /(J — 1)/2 bivariate margins that can be obtained from the full 
C-dimensional contingency table using a reduction operator matrix (see e.g., Maydeu-Olivares & Joe, 


2005). An example is given below: 


ooo (14) 
ee 000 01 1 1 Ay | 7001 
To 001 1 0 0 1 1)\{ %o10 
ms 1 A. AO SO. 25. 201 it 
fo =| i So coro 0: 6) 10a vi on | 
ita, 000001 0 1]] fo 
ita O01 OF 00. 0:0. A 
ft1141 


where L is a fixed operator matrix of 0s and 1s that reduces the response pattern probabilities and 
proportions into marginal probabilities and proportions up to order 2. 7f2 is the vector of first and 
second order marginal probabilities. Correspondingly pz, = Lp is the vector of first and second order 
observed marginal proportions. 

More general versions of the reduction operator matrices for multiple categorical IRT models 
can be derived using similar logic (see e.g., Maydeu-Olivares & Joe, 2006; Cai & Hansen, 2013). Note 
that L has full row rank. It implies that the marginal residual vector (pz — 72) = L(p — 7) isa full 
rank linear transformation of the multinomial residual vector (p — 7£). Therefore, the marginal residual 


vector (pz — 72) is asymptotically normal: 
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VN(p2 — #2) = VNL(p - #) > No (0,12), (15) 
and T, = LIL’ = LEL’ — LAF~1A'L’ = =, — A,F~1A4, where E, = LEL’, and A, = LA is the Jacobian for 
the marginal probabilities. The dimensionality Q of the normal random variable is equal to the 
number of first and second order marginal residuals. For example, in the case of dichotomous items, 
the number is Q=/+/(—1)/2 =I/(1 +1)/2. 
Summed Score Probabilities 
In addition to the response pattern and marginal probabilities, the IRT model also generates 
model-implied summed score probabilities. For a test with J items and k = 0,...,K; — 1 coded 
categories for item i, there are a total of S = 1+ Yj_,(K; — 1) summed scores ranging from 0 to S — 1. 
Suppose the observed summed probabilities based on a sample of size N are equal to p, for s = 
0,...,S — 1. Under maximum likelihood estimation of item parameters, the corresponding IRT 
model-implied summed score probabilities are formally defined as 
t=) 1 (lull), (16) 
u 
where |lu|| = YJ, u; is a notational shorthand for the summed score associated with response pattern 


u, and the indicator function takes a value of 1 if and only if s = ||u||: 


if s = ||ul| 
otherwise ° 


1, 
1 (Ihell) = {5 (17) 
Equation (16) shows that the IRT model-implied probability for summed score s is a sum over all such 
response pattern probabilities leading to summed score s, in other words, it may also be obtained by a 


reduction operator matrix. 


Let S be a matrix of fixed 0s and 1s such that the pre-multiplication of m by S yields the 
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summed score probabilities. Each row of S can be understood as a set of binary logical relations. An 
element in row j of S is equal to 1 if and only if the corresponding response pattern in 7 leads to 
summed score j — 1. In general, for J items, there are S rows and C columns in S. In particular, S$ 
has full row rank and the rows of S are mutually orthogonal. 

Returning to the 3-item example, there are 4 summed scores in this case: 0, 1, 2, and 3. The 4 x 


8 matrix S (below) relates the summed score probabilities to the original multinomial probabilities: 


TTo00 
eS ™001 m (18) 
To 100 0 0 0 0 OV\ | “010 0 
NM) e080. a Os 08 Oh ord Sn NF TE Nice, 
eS Nee PO 200 fo Ae de Ou ioe: - ee 
13 000 00 0 0 1/} 101 2 
7110 a 
411 


The observed summed score proportions can be obtained in a similar way: 


Pooo 

Poo1 (19) 
Po 10 0 0 0 0 0 Ov | Po10 
=r | Pi 011010 0 O\f Po11 
P=\5 1=SP=(9 0 0101 1 OF Piso 
D3 000 00 0 0 1/ | P101 
P110 
Pi11 


From Equation (13), under maximum likelihood estimation, the summed score residual vector 
Pp — 7 is asymptotically S-variate normally distributed: 
VN(p — %) = VN(Sp — S#) = VNS(p — #) > (0,1), (20) 
and I = SIS’ = Sdiag(m)S' — Snm'S' — SAF~1A'S' = diag(#) — mi’ — AF 1A’, with A = SA. 
The reason for introducing the reduction operator matrix S$ is primarily a theoretical one. It 
facilitates the subsequent derivations of summed score likelihood based indices for testing latent 


variable distribution fit. Pragmatically, the Lord-Wingersky (1984) algorithm should be used to 
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compute the model-implied summed score probabilities. If summed score to scale score conversion 
tables are computed (see Thissen & Wainer, 2001), the probabilities become automatic byproducts. 
Goodness of Fit Statistics for IRT models 
Existing overall goodness of fit indices may be used for testing latent variable distribution fit in 
IRT. The full-information test statistics such as likelihood ratio G* and Pearson’s X* use residuals 
based on the full response pattern cross-classifications to test the IRT model against the general 
multinomial alternative. The comparison between 7, and p, (on logarithmic or linear scales) leads to 


well-known goodness of fit statistics such as the likelihood ratio G* and Pearson’s X*: 
ae 
—it 
G?=2N) pyloge, x? ayy Pu (21) 
Ty Ty 
u Uu 


Under the null hypothesis that the IRT model fits exactly, these two statistics have the same asymptotic 
reference distribution, which is a central chi-square with degrees of freedom equal to C—1-—d 
(Bishop et al., 1975). For subsequent development, it is instructive to rewrite Pearson’s statistic as a 
quadratic form in multinomial residuals: X* = N(p — 7t)'[diag (7#)]~+(p — 72). 

Unfortunately, as the number of items increases, the number of response patterns increases 
exponentially. For more than a dozen or so dichotomous items (or perhaps a handful of polytomous 
items), the contingency table upon which the multinomial is defined becomes sparse for any realistic N. 
Consequently the asymptotic chi-square approximations for the full-information test statistics break 
down (see e.g., Bartholomew & Tzamourani, 1999) and the utility of the full-information overall 


goodness of fit indices for routine IRT applications becomes questionable. 
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Recently, limited-information overall fit statistics such as Maydeu-Olivares and Joe’s (2005) 

M, have been developed. Limited-information fit statistics use residuals based on lower order (e.g., 
first and second order) margins of the contingency table. These lower order margins are far better filled 
when compared to the sparse full contingency table. There is growing awareness that 
limited-information tests can maintain correct size and can be more powerful than the full-information 
tests (Cai, Maydeu-Olivares, Coffman, & Thissen, 2006; Joe & Maydeu-Olivares, 2010). 

Under the assumption that the number of first and second order margins is larger than the 
number of free parameters (Q > d) and that A, has full column rank (local identification), Mz can be 
written as 

M, = N(p2 — 7€2)'A, (4, 2,4,] "0 (pz — 72), (22) 
where A, isa Q x (Q—d) orthogonal complement of A, such that ASA, = 0. From Equation (15), 
(pz — 72) is asymptotically normal with zero means and covariance matrix EZ, — A,F~*A}, which 
implies that the covariance matrix of Ay (pz —7f2) is AZ,A,.Thus, M2 is asymptotically chi-square 
distributed with Q —d degrees of freedom. In the current simulation study, Mz will be used as a 
benchmark due to its numerous desirable properties identified in the literature (see e.g., Cai & Hansen, 
2013). Performance of the proposed latent variable distribution fit indices will be evaluated against M). 

While an overall test may be used to detect specification errors of latent variable distributions, 
the fact that they are also sensitive to other forms of model error (e.g., unmodeled multidimensionality) 
makes it difficult to pinpoint the source of misspecification. To that end, more specific diagnostic 


indices have been created for IRT. For example, Chen and Thissen’s (1997) local dependence indices are 
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particularly sensitive to violations of the local independence assumption. Orlando and Thissen’s (2000) 
item fit diagnostics is another example where the extent to which the IRT model fits the empirical 
operating characteristics for an item (e.g., whether monotonicity holds) can be examined. The next 
section develops a set of indices that specifically target latent variable distribution fit for IRT models. 
The Summed Score Likelihood Based Indices and Statistical Adjustments 

There are two important lines of reasoning for the derivation of these model fit indices. The first 
is a recognition based on heuristics: IRT model-implied summed score probabilities may provide useful 
diagnostic information about the latent variable distributional assumption (Thissen & Wainer, 2001; p. 
130). The second recognition is that the summed score likelihood based indices are formally 
limited-information test statistics. 
A Heuristic Motivation 

When the latent variable distribution assumed in the IRT model does not represent the 
population distribution of the respondents adequately, the model-implied summed score probabilities 
i, will depart from the observed summed score probabilities »,. Hence all that is needed is to find 
appropriate test statistics that can summarize the degree to which the model-implied and observed 
summed score probabilities diverge. It is also preferable if the indices are approximately chi-square 
distributed test statistics. Pearson’s X* introduced in the previous section meets this requirement. 

Recall that the total number of summed scores is S = 1 + Y}_,(K; — 1). The Pearson-type X? 
below yields a direct comparison between the model-implied summed score probabilities 7, and the 


observed summed score probabilities ps: 
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SH 1 = 
pany Gz (23) 
Ton - 
s=0 


where p, and 7, represent the observed and model-implied summed score probability for score s 
respectively. This test statistic is different from the full-information test statistic shown in Equation (21) 
because it is based on summed score probabilities as opposed to response pattern probabilities. 

In preliminary studies (Li & Cai, 2012) we had conjectured that under a wide variety of 
conditions X? may have similar asymptotic distributions whose tail-area probabilities can be 
approximated by a central chi-squared random variable with S$ —1—2 degrees of freedom under the 
null hypothesis that the latent variable distribution g(@) is correctly specified in the IRT model. This 
conjecture will be tested in the sequel with simulations. 

The rationale behind the specific degrees of freedom is as follows. The S summed scores 
probabilities must sum to 1. The first minus 1 is to reflect that constraint. Had the item parameters been 
known, the degrees of freedom would have been exactly S — 1. When the item parameters are 
estimated (assuming with maximum marginal likelihood), an additional penalty must be introduced to 
reflect the effect of parameter estimation. While the location and scale of the latent variable @ are 
typically fixed for model identification, the model-implied summed score distribution does not have an 
inherent location and scale. The location and scale is determined as a result of estimating the item 
parameters. Hence the estimation of item parameters amounts to adding at least two more constraints 
for the model-implied summed score probability distribution. The details are of course more complex, 


and will be explained next. 
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A More Formal Derivation 

While the proposed test statistics are not associated with particular marginal probabilities in the 
same manner as Maydeu-Olivares & Joe’s (2005) M2, they are nevertheless related to the response 
pattern probabilities via the reduction operator matrix S defined earlier (see Equations 18). It is the 
choice of this particular reduction operator that leads to more focused tests targeting latent variable 
distribution fit (see Joe & Maydeu-Olivares, 2010). For IRT models with constrained equal item 
discrimination parameters (e.g., the 1PL model), it is widely recognized that the summed scores are 
sufficient statistics for the latent variables in the model. Though the summed score sufficiency property 
does not hold for other IRT models such as the 2PL or the graded model, researchers have nevertheless 
found that summed score is an important source of information regarding the ordering of individuals 
along the latent variable continuum (e.g., van der Ark, 2005). One could even base parameter 
estimation on summed score groups (Chen & Thissen, 1999). 

Using the reduction operator S, the derivations above imply that the Pearson-type statistic X? 
can be rewritten as 


- S-1,- ay 24 
wan y PO _ yg _ ay [diag IG -*) a 
s=0 ? 


where (p — 7) = S(p — 7) is the summed score residual vector (see Equation 20). Under the null 
hypothesis that the IRT model is correctly specified, one can obtain the probability limit of the weight 
matrix as plim([diag(7)]~*) = [diag(z)]~1 by the consistency of the maximum likelihood estimator 


(see Equation 9), the continuity of the mapping from y to the summed score probabilities, and the 
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continuity of the matrix inverse. Following results on quadratic forms of random vectors (e.g., Mathai 
& Provost, 1992, p. 53), the asymptotic expected value of X? is equal to 
tr{I [diag (7)|~*} = tr{[diag (m) — 770’ |[diag(7)]~1} — tr(AF~1A'[diag(m)]~*) (25) 
= §—1-tr{F-1A' [diag (7)|~“1A} = py. 

From Equations (24) and (25) we can see that the statistic X* cannot be asymptotically 
chi-square distributed. Even though it is a quadratic form in asymptotically normally distributed 
random vectors, a key condition for its chi-squaredness is not met. That is, the product of the 
probability limit of the weight matrix [diag(7)]~* and the covariance matrix of the normal random 
vector F is not idempotent in general, i.e., F[diag(m)]~ ‘I [diag (a)]~* # I'[diag(z)]~*. On the other 
hand, Equation (25) shows that the asymptotic expected value of X? is equal to S—1 minusa 
constant that depends on the trace of F~1A’[diag(z)]~1A, which reflects additional uncertainty due to 
estimation of item parameters. With the first-order moment of X?, the Satorra-Bentler type moment 
adjustment approaches can be applied to adjust the statistic, so that the tail area of its distribution can 


be better approximated by a chi-squared distribution (Satorra & Bentler, 1994; Cai et al., 2006). 


Adjustment of Statistics 

According to Satorra & Bentler’s (1994) paper, test statistics that do not asymptotically follow a 
chi-squared distribution can be corrected, by matching the mean (or the mean & variance) to fixed 
degrees of freedom. Let df indicate the degrees of freedom of interest, and 4, indicate the asymptotic 


expected value of X*. The moment adjusted statistic is 
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X2 =X? eg (26) 
df 

Theoretically, the constant df can take on an arbitrary value. For the purpose of 
comparison, df will take the value of S— 1-2 in this paper. 

One challenge to obtain the adjusted statistics is calculating the first-order moment in Equation 
(25). Some commercial software for IRT (e.g., flexMIRT®; Cai, 2013) provides the Fisher information 
matrix F and the model-implied summed score probabilities 7 in the output file, but currently none 
of them produces the Jacobian matrix A. Numerical calculation of the Jacobian matrix can be 
computationally demanding, especially when the number of items (7) is large. Take the 2PL IRT model 
as an example. It requires the computation of 2 x 2 first-order derivatives to obtain A. For a test of 12 
items, 8,192 first-order derivatives need to be computed. When n increases to 24, that requires 
33,554,432 first-order derivatives to be computed. To solve this problem, a modification of 
Lord-Wingersky algorithm (Lord & Wingersky, 1984) for calculating the Jacobian matrix is developed 
(see Appendix). Once the Jacobian matrix is computed, the first-order moments of X? can be 
computed. 

Simulations 

Simulations were undertaken to evaluate the summed score likelihood based indices X* and 
Xé, by comparing them with Maydeu-Olivares and Joe’s M2. There were 108 conditions (2*2*3*3*3), 
with 1000 replications in each condition (Table 1). Manipulated factors were the IRT model type (2PL or 


Graded), the number of items (12 or 24), the sample size (500, 1000 or 1500), dispersion of item 


parameters (equal, random or dispersed), and the distribution of latent variable (unidimensional 
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normal, unidimensional nonnormal or multidimensional multivariate normal). 

In the null condition, response pattern data were simulated with a latent variable having 
unidimensional normal distribution. In the alternative conditions, response pattern data were 
simulated either with a nonnormally distributed latent variable or with a bivariate normally distributed 
latent variable. The nonnormal @s were generated from a distribution obtained from a 1:4 mixture of 
two normally distributed densities (M, = 1,SD, = 0.4; Mz =0,SD, =1). The multidimensional 6 
distribution is standard bivariate normal with correlation equal to 0.9, representing substantial overlap 
between the two dimensions. Half of the items loaded on each dimension in a pure between-item 
multidimensional model. In other words, each item is only directly influenced by a single dimension, 
but the dimensions are correlated. 

There were 3 conditions for item parameters. For the “Equal Slopes and Equal Intercepts” 
condition, all the slope parameters are fixed to 1, and all the intercept parameters are fixed to 0. For the 
“Random Slopes and Random Intercepts” condition, parameters for 24 items were randomly generated 
with properties mimicking standard educational and psychological assessments. Discrimination (a) 
parameters were drawn from a log-normal distribution (M = 0.5,SD = 0.2), the threshold values (b) 
were drawn from anormal distribution (M = 0,SD = 0.75), the intercepts (c) were calculated as (—ab). 
Parameters for the first 12 items were used for shorter tests. For the “Dispersed Slopes and Dispersed 
Intercepts” condition, item slope parameters were designed to spread from 1 to 3 in equal increments, 
while item thresholds spread from -2 to 2 across the 12 or 24 items. 


The fitted models were standard unidimensional IRT models. In the null conditions, the 
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data-generating models and the fitted models were the same. In the alternative conditions, the fitted 
models were mis-specified for ignoring either latent variable nonnormality or multidimensionality. 
Bock and Aitkin’s (1981) EM algorithm was used to obtain maximum likelihood estimates, and the 
Lord-Wingersky (1984) algorithm was used to compute the model-implied summed score probabilities. 

To compare the performance of the fit statistics, empirical Type I Error rates were computed in 
the null conditions, and empirically observed power were computed in the alternative conditions at 
three alpha levels: 0.01, 0.05, and 0.10. In addition, another model fit index, Maydeu-Olivares and Joe’s 
M, was employed as a benchmark. 

Results 

Type I Error Rates 

Table 2 and Table 3 present the simulation study results for the unidimensional normal case 
under the null hypothesis. The extent to which the tail areas of the proposed statistics’ distribution are 
well approximated is examined by comparing the observed Type I Error rates against the nominal 
alpha levels. The results indicate that, when the slope and threshold parameters are equal across items, 
the adjusted and unadjusted summed score likelihood based indices both work well. Empirical 
rejection rates and their corresponding alpha levels are close to each other. However, when the item 
parameters become dispersed, the adjusted statistic X2 performs better than the unadjusted statistic 
X*. These results hold across different numbers of items and different sample sizes. 

Furthermore, as suggested earlier, the observed means of these indexes should be close to the 


expected values of the approximating chi-squared distributions (the degrees of freedom). The results in 
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Table 2 and Table 3 confirm that when item parameters are equal, the means are close to the degrees of 
freedom, and the variance is approximately twice the degrees of freedom. Notice that when the number 
of items or the sample size increases, the results improve. For the 2PL model, when the item parameters 
are dispersed, the moment adjusted statistic X% improves upon X? with a heuristic degrees of 
freedom. However, for the graded model, both X? and the adjusted statistic X% perform well in the 
null condition. In addition, Maydeu-Olivares and Joe’s M2 appears to be well calibrated for the 
conditions we tested. 
Power 

From Table 4 and Table 5, it is clear that the summed score likelihood based indices have 
substantially higher power than M, when the latent variable distribution is nonnormal. The 
performance of the proposed statistics are heavily influenced by the number of items and dispersion of 
item parameters. For both 2PL and Graded models, the power of the proposed indexes grow as the 
sample size and number of items increase. This is to be expected as more data bring more information 
about the latent variable distribution. When the item slope and threshold parameters are equal across 
items, the unadjusted and adjusted statistics perform equally well. However, when the item parameters 
are dispersed, the adjusted statistic X@ has higher power than the unadjusted statistic X*. Finally, 
Table 6 and Table 7 provide some evidence that the summed score likelihood based indices are not 
sensitive to model misspecification related to multidimensionality, in contrast to M2. This is a desirable 
feature of the proposed indices, which ought to be more targeted against specific forms of model 


misspecification. Mz on the other hand, is a more general index for global model fit assessment. 
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An Application to Empirical Data 

We illustrate the test statistics with empirical data. 12 items related to positive consequences of 
nicotine (PCN, Tucker et al., 2014), as part of a questionnaire dealing with various attitudes, beliefs and 
behaviors related to smoking (Shadel, Edelen, & Tucker, 2011), were administered to a sample of 2717 
daily cigarette smokers. Each item was rated on a 5-point ordinal scale. This study was part of the 
development of the National Institute of Health’s Patient Reported Outcomes Measurement 
Information System (PROMIS) and extensive item and dimensional analysis was conducted prior to 
calibration of the items as unidimensional. The density plot (Figure 1) of the latent variable distribution 
for this subscale shows its deviation from a standard normal distribution that there are two maximum 
points in the middle instead of a “bell curve” shape. Table 8 presents the contents of the 12 items from 
the PROMIS smoking assessment. 

Results show that when we use the normal unidimensional IRT model, X? equals to 208.5, and 
Xé equals to 179.3, indicating significant lack of latent variable normality (df = 46, p < 0.0001). But 
when the empirical histogram latent density estimation is used instead for item parameter estimation, 
X? is equal to 51.2 and XZ is equal to 49.4 (df = 46, p>0.1). Insum, we came to the conclusion that the 
latent variable distribution of this set of items was probably nonnormal and our proposed indices were 
able to detect the violation of latent variable distribution assumption. 

Discussion 
Normality of latent variable distribution is a critical assumption in standard maximum marginal 


likelihood estimation for IRT models. However, in real world applications, the distribution of latent 
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variables can be nonnormal. The detection of latent variable nonnormality is important for item 
analysis and test scoring. In this study we propose using summed score likelihood based indices for 
testing departures from normality. We also develop a Satorra-Bentler type moment adjustment 
approach to approximate the tail area probabilities of the indices. 

In the simulation study, the performance of unadjusted and adjusted summed score likelihood 
based statistics was compared to that of M2. Results show that the moment-adjusted index performs 
well for both dichotomous data and polytomous data and maintains correct test size across number of 
items, sample size, and type of IRT model considered. The unadjusted statistic does not work as well, 
especially when item parameters are dispersed. Furthermore, the indices were particularly sensitive to 
latent variable nonnormality, and not sensitive to other kinds of model misfit such as 
multidimensionality. 

An interesting finding is that the general goodness-of-fit statistic M, (Maydeu-Olivares & Joe, 
2005) has almost no power against the nonnormal alternative and hence cannot be recommended for 
testing latent variable distribution fit for IRT models (see also Hansen et al, 2016). This could be 
explained by the observation that M, is based only on first and second order margins of the 
underlying contingency table, but to detect latent variable distributional misfit, information from 
higher order margins may be necessary. 

This study is not without its limitations. First, the distributions of the proposed indices are not 
exactly chi-squared. In our study, their tail-area probabilities were approximated to first order by a 


chi-squared variable with the availability of the item parameter error covariance matrix and a Jacobian. 


Testing Latent Variable Distribution 24 


We focus on the first order correction due to its simplicity and the fact that we observed empirically 
that the results of the second order correction did not differ substantively from that of the first order. In 
the future, higher order moments could be considered to improve the performance of the adjusted 
statistics for situations that we have not examined. Second, only a limited number of null conditions 
and only two alternative population distributions were tested in the simulations. More extensive 
simulations are needed to fully understand the performance of the test statistics. Third, we only studied 
the properties of the statistics and the corrections under maximum likelihood estimation. In principle, 
one could derive similar statistics under limited-information estimation (e.g., with weighted least 
squares). Finally, this study only considered the conditions when item response data are assumed to 
be unidimensional. Multidimensional IRT models (MIRT, Reckase, 2009) should be considered in 
subsequent work. One particularly popular model in educational and psychological research is the 
full-information item bifactor model (Gibbons & Hedeker, 1992; Cai, Yang, & Hansen, 2011; Reise, 2012). 
In this model, all items load on a general dimension, and an item is permitted to load on at most one 
specific dimension that influences non-overlapping subsets of items. This feature of bifactor models 
implies that there exits valuable relation between an observed summed score and the distribution of the 
latent general dimension (Cai, 2015). This relation implies an opportunity to test the underlying 
assumption about the distribution of general latent dimension with summed score likelihood based 


statistics. 
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Appendix: A Modified Lord-Wingersky Algorithm for Jacobian Computations 

Consider a test with n dichotomous items, calibrated by a 2PL IRT model. Recall that 
T;(1|@) is item 7's traceline for category 1 (Equation 1), with T;(0|0) =1- 7;(1|@) for category 0. 
Theoretically, there should be 2” response patterns. The response patterns is indicated by u = 
(uy, ...,U,). Under the assumption of items' conditional independence, the likelihood for a 
response pattern u can be expressed as L(u|@) = []/_, 7;(u;|@). For n dichotomous items, the 
summed score s ranges from 0 ton. S =n+1 is the number of all possible summed scores. 
Recall that ||u|| = 7, u; is a notational shorthand for the summed score associated with 


response u (see Equation 16). The likelihood for summed score s = 0, ..., n is defined as 


L(s}8)= ) Laule)= > [ [neuen 


llull=s s=|lul| i=1 


(27) 


Clearly, the likelihood of a summed score s is the sum of all response pattern likelihoods 
for ||u|| = s. In Lord-Wingersky algorithm, the summed score likelihoods are built up 
recursively, one at a time (Lord & Wingersky, 1984). Let L;(s|@) indicate the likelihood for 
summed score s after item i has been added into the computation. In the first step, two summed 
score likelihoods are computed based on the tracelines of item 1: L,(0|@) =7T,(0|@) and 
L,(1|6) = 7,48). 

In the second step, we have three summed score likelihoods based on the likelihoods 
from step 1 and tracelines of item 2: 


L,(0|6) = L, (0|6)T, (0/8), (28) 
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L2(1|@) = Ly |@)T2(0]6) + L1(014)7, 14), 
L2(2|@) = Ly |@)T2 (418). 
Suppose n items have been added. The likelihoods for summed scores (0,...,n) are: 


Ly (0/8) = Ly-1(018)T, O19), 


Ln (s|@) = Ly-1(s1@)Tp (018) + Ln-1(s — 11)T, (118), (29) 


Ly (n|9) = Ly-1(n — 1]8)T, (119). 

To obtain the Jacobian matrix of summed score likelihoods with respective to item 
parameters, the Lord-Wingersky algorithm is adapted slightly. As previously mentioned, in the 
first step, there are only two summed score likelihoods based on item 1: L,(0|@) and L,(1|@). 
The first-order derivatives of summed score likelihoods with respect to a generic item 
parameter y, for item 1 are: 


AL,(018) _ A7,(018) 


Oy, Oy, (30) 
OL,(1|6) — OT, (116) 
Ov 7 Oy, 


In the second step, item 2 is added with a generic item parameter y2. The first-order 


derivatives of summed score likelihoods with respect to y, and yz follows from the chain rule: 


OL2(0|8) AL, (0|8) 
1% =60 
OL,(1|8) OL,(1|8) OL,(0|8) 
——— = —.—T,(0|8¢) + ———— 
Oy; Oy ay Oy 


T,(0|8), 
(31) 
T, (18), 


Testing Latent Variable Distribution 32 


AL,(2|8) _ AL, (1/0) 


= T,(1]@), 
Oy, Oy, 2( | ) 
OL,(0|0) OT,(0|@) 
———_ = L, (0|0 ae 
dY2 1(018) OY2 
OL,(1|@ OT,(1]6 
at | )~ Lae)? T2(0|0 OT2(018) | ola) 2(1| 
Y2 OY2 02 
OL,(1|@ OT,(1|6 
210) = 1, yoy 22D 
Y2 


Generalizing to n items, the first-order derivatives of summed score likelihood 


functions with respect to the n item's parameters (71,...,Y,) are: 


OL, (0|9) _ ALy-1 (019) 


oe ra(018), 
a = St (010) + Sr a0), 
deo _ gia HO a 

(32) 

ee = in 10010) 
oe SY A181) 4 1 (6 — 110) 2, 

dYn OYn OYn 
= by ln 1) 


The process of modified Lord-Wingersky algorithm for calculating the Jacobian matrix is 
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illustrated with an example. Consider a simple test with three dichotomous items. The values of 
slope parameters are a = (1.0,0.8, 1.2), and the values of intercept parameters are c = 
(—0.2, 0.6, —1.0). Recall that the marginal probability for summed scores with known g(@) is 
p(s) = { L(sla) g(@)a9, (33) 
The integrals in Equation (33) must be approximated by quadrature. We demonstrate 
the algorithm by showing the calculations over a set of quadrature points (Cai, 2015). We 


approximate the marginal probability using Q quadrature points: 


Q 
p(s) = | ule) g@)a0 = > L(s|Xq)W(X;); (34) 


q=1 


where X, is a quadrature node and W(X;,) is the corresponding quadrature weight. To obtain 
W(X), a set of normalized ordinates of the prior density are applied (Cai, 2015), ie, W(X,) = 
9(Xq)/ Liga 9Xq)- 

Table A1 shows the recursive computations for the parameters from item 3. It shows the 
values of summed score likelihoods, first-order derivatives of tracelines, and the first-order 
derivatives of summed score likelihoods at five equally spaced quadrature points (Q = 5): -2, -1, 
0, 1, and 2. More quadrature points should be used for better precision (Cai, 2015). The first 
block presents the summed score likelihoods after the 1st and 2nd items are added in. The 
second block presents the first-order derivatives of item 3's tracelines with respect to its slope 
parameter. The third block presents the first-order derivatives of summed score likelihoods 


with respect to item 3's slope parameter. 
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Table A2 presents the first-order derivatives of summed score probabilities with respect 
to item 3 (the desired Jacobian elements). W(@) indicates quadrature weights at each @ level. 
“Weighted derivatives” are found by multiplying (point to point) the first-order derivatives of 
summed score likelihoods with W(@). The last column “Jacobian” indicates the first-order 
derivatives of summed score probabilities with respect to item 3's slope parameter. It is the 
summation of the weighted derivatives over all quadrature points for each summed score 


likelihood. 
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Table 1 
Manipulated Factors and Conditions for Simulation Study 
Factor (Levels) Conditions 
Types of IRT Model (2) 2PL, Graded 
Number of Items (2) 12, 24 
Sample Size (3) 500, 1000, 1500 
Values of Item Parameters (3) Equal Slopes and Equal Intercepts 
Random Slopes and Random Intercepts 
Dispersed Slopes and Dispersed Intercepts 
Latent Variable Distribution (3) Normally Distributed Unidimensional 
Nonnormally Distributed Unidimensional 
Correlated Bivariate Normally Distributed 
Notes. The factors are fully crossed ina 2 x 2x3 x3x3 design with 1000 attempted 


replications per cell. 


35 
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Table 2 
Selected Simulation Results under the Null Hypothesis: Normally Distributed Unidimensional Latent Variable in 2PL Models 
Random Slopes and Dispersed Slopes and 
Equal Slopes and Intercepts 
Re de ge Intercepts Intercepts 
n 
. ERR* KS ERR* KS ERR* KS 
M Var  ——— M Var —— M Var —— 
01 05 p 01 05 p 01 05 p 
12 500 Ke 10 10.0 19.7 O01 .04 .94 9.2 18.1 01 .04 .00 9.1 18.0 .01 .03  .00 
Ke 10 10.0 19.9 01 .05 .98 10.1 21.5 01 .06 51 10.4 234 02 .06 .08 
M> 54 54.3 119 02 06 33 54.5 112.8 01 .06 15 54.2 1164 .01 06 .25 
12 1500 Xe 10 10.0 18.8 01 .05 .64 9.3 17.2 01 .03 .00 8.9 15.9 .01 03 .00 
X? 10 10.0 18.9 .01 .05 49 10.1 20.1 .01 .05 .96 10.0 20.2 .01 .04 1.00 
M> 54 53.8 112.9 O01 .06 83 53.8 101.2 .01 .04 .79 53.8 107.0 .01 .05  .36 
24 1500 xe 22 21.8 453 01 .06 15 21.5 40.2 01 .04 .03 214 448 .01 05 .03 
Xe 22 21.9 45.5 01 .06 .26 22.1 42.3 01 .05 .75 22.2 48.0 .02 .06 .70 
M> 252 252.9 483.3 .02 .05 .20 251.9 502.2 01 .05 .81 251.6 526.7 .01 04 .55 


*Note: Empirical Rejection Rates at a levels 0.01 and 0.05. 
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Table 3 


Selected Simulation Results under the Null Hypothesis: Normally Distributed Unidimensional Latent Variable in Graded Models 


Random Slopes and Dispersed Slopes and 
Equal Slopes and Intercepts 
aes ang P Intercepts Intercepts 
n naex WC ee ee ee get 
i ERR* KS ERR* KS ERR* KS 
M Te M Ve a, =e M VaR i — 
Ol" .05— -p 01 05 p 01 05 p 


12 500 x? 34 345 69.9 01 .06 .07 34.1 65.7 01 .05 .67 33.8 65.6 01 04 .97 
Xe 34 34.6 70.3 01 .06 02 34.4 668 01 05 .12 345 684 01 06 .04 


M2 30 «29.5 60.4 01 05 .06 298 59.0" 301 305. -63 29.7> 000% Ole OS. cae 


12 1500 x? 34 =33.8 69.0 01 .04 .64 33.4 596 01 .03 .O1 33.5 618 01 03 .21 
Xe 34 333.9 69.3 01 04 .75 33.8 60.9 01 .04 .10 344 650 01 05 .26 


M, 30 = 331.8 80.6 =.02 08 =.00 29.8 566 01 04 .77 30.1 586 01 04 .24 


24 1500 x? 70 70.9 147.1 01 .06 .03 70.3 136.0 .01 .05 46 69.9 138.6 .01 05 .36 
Ke 70 70.9 147.3 01 .06 .03 70.5 137.1 01 06 .24 70.4 140.7. .01 05 .15 
M2 204 2043 435.9 01 .06 .58 202.0 369.9 .01 .03 .00 204.6 414.6 .01 05 .29 


*Note: Empirical Rejection Rates at a levels 0.01 and 0.05. 
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Table 4 


Selected Simulation Results under the Alternative Hypothesis: Nonnormally Distributed Unidimensional Latent Variable in 2PL Models 


Equal Slopes and Random Slopes and Dispersed Slopes and 
Intercepts Intercepts Intercepts 
n N_ Index df 
Power Power Power 
M M M 
01 05  .10 OV" = 0B. 10 OL! -_ 05: ~~ -<10 
12 500 x? 10 13.1 07 17 ~~ .28 LES. -cO2 osl00> 320 10:7: 02° 06> 1S 
Xe 10> 13.2% OF > 319. 28 13.0 04 17 27 124 04 14 24 
M, 54 53.8 01 04 ~~ .10 55.2 02 07 ~~ «14 Bevo. 02" =d08. 513 
12 1500 x? TO 19.3) +428". 450° 462 17.1 16 39 54 1443 06 .21 ~~ 34 
Xe 10 194 28 51 .62 18.7. 23 47 ~~ 63 164 13 34 48 
M2 54 53.8 01 05 ~~ «11 55.4 01 .06 «12 5B.2- 302 306° «12 
24 1500 Xx? 22. 40.9 49 .73 83 38.1 39 64 .76 37.22 36 62 ~~ .74 
Xe 22 41.2 50 .73 ~~ .84 394 45 .69 ~~ .80 38.8 42 68  .79 


M2 252 250.5 .01 04 ~~ «10 255.9 02 08 «14 208./- 02°. 10° 17 
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Table 5 


Selected Simulation Results under the Alternative Hypothesis: Nonnormally Distributed Unidimensional Latent Variable in Graded Models 


Equal Slopes and Random Slopes and Dispersed Slopes and 
Intercepts Intercepts Intercepts 
n N_ Index df 
Power Power Power 
M M M 
01 05  .10 (01 —20B 10 OL« -_ 05: - <0 
12 500 x? 34 38.0 04 12 ~~ 221 409 06 19 32 B89 2022) NS" 422 
Xé 34 38.2. 040.1322 414 07 21 ~~ 33 B9F 08° GIS. . 427 
M, 30 294 01 03 ~~ .08 296 01 .05 ~~ «10 30.0) =.01- 206 »<10 
12 1500 x? 34 45.5 15 37 49 DUD: 29. 53: 370 45.8 14 37 51 
Xe 34 45.6 16 37 «49 NL6, 31 3238) 71 47.2 18 42  .56 
M2 30° 316. 03: .,09- <15 30.3. 01 06 11 30,3 201: 305... 20 
24 1500 Xx? AQ “GSR -c2- ..B7> 470 96.8 38 66 .79 9804 429" BF ~ Zh 
Xe 70 94.0 33 57  .70 974 40 .67 «80 94.5 32 60 .73 


M2 204 2025 01 04 = .09 204.5 01 04 09 206.8 02 07 12 
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Table 6 
Selected Simulation Results under the Alternative Hypothesis: Multidimensional Distributed Unidimensional Latent Variable in 2PL Models 


Equal Slopes and Random Slopes and Dispersed Slopes and 

Intercepts Intercepts Intercepts 

n N_ Index df 
Power Power Power 
M M M 

OL. ~-.05" 10 {01.. -.05~ - 2210 01 05 ~~ .10 
12 500 x IQ. 100° 202. 05°: -..09 98 01 .04 10 92 00 04 #07 
Xé TO: TON. 302, 405. -09 10.5. - 302°. 207°- 3:13 LO. 02: 407" aif 


M2 54. 83.9 54 8.74 81 161.2. 1.00 1.00 1.00 142.4 1.00 1.00 1.00 


12 1500 xX? TQ. TO2.. 01. 205 Al 10.1 02 06 ~~ «11 9.4 00 04 07 
Ke 10) -<JO2. .O1-~ 205: 211 108 03 09 ~~ 14 10.5 01 OF «13 


M2 54 140.8 1.00 1.00 1.00 377.3. 1.00 1.00 1.00 320.5 1.00 1.00 1.00 


24 1500 x 22 248 01 06. lt 22.4 O01 07 ~~ 11 22:0) > - OL 105~ »<09 
Xe 22 218 01 06 11 22.9 02  .08 13 22d 2. 07-412 


M, 252 617.8 1.00 1.00 1.00 1281 1.00 1.00 1.00 1544 1.00 1.00 1.00 
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Table 7 


Selected Simulation Results under the Alternative Hypothesis: Multidimensional Distributed Unidimensional Latent Variable in Graded Models 


Equal Slopes and Random Slopes and Dispersed Slopes and 

Intercepts Intercepts Intercepts 

n N_ Index df 
Power Power Power 
M M M 

OL. -.05*~..10 (01.. ~05>, 2210 OL - 05. 20 
12 500 x 34 34.7, 01 06 .12 34.3 01 .06 ~~ «10 34.1 01 05 ~~ «10 
Xé 34 34.7 01 06 12 345 01 06 11 34.7. 01 06 «13 


M, 30 503 42 60 .68 183.4 1.00 1.00 1.00 205.8 1.00 1.00 1.00 


12 1500 x? 34 346 O01 07 12 34.2. 01 04  ~=.10 33.1 .01 03 08 
Ke 34 346 O01 07 12 34.4 01 05 «11 34.0  .01 04  .10 


M2 30 88.20 840 90—s—92 429.7 1.00 1.00 1.00 455.2 1.00 1.00 1.00 


24 1500 x ZQ0 FAS * 02 07 lS 704 01 04 «10 70.2 .O1 06 11 
Xe 70 71.3 02 07 ~~ .13 70.6 01 05 ~~ -.10 70.7 ~——«.O1 07 = .12 


M, 204 554.5 1.00 1.00 1.00 1961 1.00 1.00 1.00 2265 1.00 1.00 1.00 
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Table 8 Items from PROMIS Smoking Initiative 

Item Wordings 
Item1 Smoking helps me concentrate. 
Item 2 Smoking helps me think more clearly. 
Item3 Smoking helps me stay focused. 
Item4 Smoking makes me feel better in social situations. 
Item5 = Smoking makes me feel more self-confident with others. 
Item6 Smoking helps me feel more relaxed when I'm with other people. 
Item 7 Smoking helps me deal with anxiety. 
Item8 Smoking calms me down. 
Item9 If I'm feeling irritable, a cigarette will help me relax. 
Item 10 Smoking a cigarette energizes me. 
Item 11 Smoking makes me feel less tired. 


Item 12 Smoking perks me up. 


Testing Latent Variable Distribution 43 


Table Al 


First-order Derivatives of Summed Score Likelihoods with Respect to Item 3's Slope Parameter at 5 Rectangular 


Quadrature Points 


Quadrature Points -2 -1 0 1 2 


Summed Score Likelihoods After Items 1 and 2 


L,(0|6) 658 423 195 061 014 
L,(1]0) 315 473 515 385 p13 
L,(2|6) 027 104 291 553 773 


Derivatives of Tracelines with Respect to Item 3's Slope Parameter 


0T3(1|@) 
acre tesa -.063 -.090 .000 248 317 
daz 
aT3(0|0) 
— 063 090 .000 -.248 = 317 
daz 
First-order Derivatives of Summed Score Likelihoods 
OL.(0|0 OT.(0|0 
GEOG) i (o|9) 273019) 041 .038 .000 -.015 -.004 
daz 0a; 
AL3(1|0) _ 8T3(0|6) 8T3(1|0) 
OL3(2|@ 0T3(0|0 0T3(1|0 
CEA) L,(2|6) oFs018) of 1(1]6) 3!) -.018 -.033 .000 -.042 2177 
daz daz daz 
0L3(3|0 0T3(1|0 
OisISI8)s L,(2|0) gtsGe) -.002 -.009 .000 137 245 
daz daz 
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Table A2 


First-order Derivatives of Summed Score Probabilities with Respect to Item 3’s Slope Parameter at 5 Rectangular 


Quadrature Points 


Quadrature Points 


-2 -1 0 1 2 


W(@) 054 244 403 244 054 


First-order Derivatives of Summed Score Likelihoods 


dL3(0|8) 
aa, 041 .038 .000 -.015 -.004 
dL3(1|6) 
das -.021 .005 .000 -.080 -.063 
OL3(2|0) 
Aa -.018 -.033 .000 -.042 -.177 
3 
dL; (3/8) 
-.002 -.009 .000 37 .245 
0a; 
Weighted Derivatives Jacobian 
dL; (0|6) 
qa .002 .009 .000 -.004 .000 .008 
aL3(1|0) 
ade) -.001 .001 .000 -.020 -.003 -.023 
OL; (2|6) 
5 WO) -.001 -.008 .000 -.010 -.010 -.029 
dL3(3|8) 
aa WO) .000 -.002 .000 .033 013 044 
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Figure 1: Latent Variable Distribution for Empirical Data. Estimated (using empirical histogram) 
probability density of the latent variable is plotted (dotted line), when super-imposed on a standard 


normal density (solid line). 


